Sociology 229:  Advanced Regression Models

 

Short Assignment #1:  Multinomial Logistic Regression

 

Due:  Start of class (9:00) January 19

 

This assignment requires a dataset on the course website entitled “Assignment 1 Multinomial Data.dta”.  The dataset includes information on approximately 1,200 Protestants respondents from the GSS.  The dataset includes a variable “religid” which indicates whether the respondent is a “fundamentalist”, “evangelical”, “mainline”, or “liberal” Protestant.

 

  1. Download the Assignment 1 dataset
  2. Create your own “do” file that opens the data
  3. Use the “tabulate” command to examine the variable religid, which will be the dependent variable of your analyses.  Try the option “nolabel” to identify the actual numerical codes for each category.
    1. tab religid
    2. tab religid, nolabel
  4. Run a multinomial logistic regression model looking only at the effect of education on religious identification among protestants
    1. mlogit religid educ
    2. Note that Stata chooses “mainline” protestants as the reference group for the analysis.  Stata chooses the largest group by default.  This may or may not provide contrasts.  You can manually specify the reference group with the option “baseoutcome”.
  5. Run a multinomial logistic regression with additional independent variables:  education, income, gender (female dummy), age, and frequency of religious attendance
    1. mlogit religid educ income dfemale age attend
  6. Run the same model, but request “relative risk ratios” instead of raw coefficients
    1. mlogit religid educ income dfemale age attend, rrr
  7. Run the same analysis, but choose “fundamentalist” Protestants as the base outcome
    1. mlogit religid educ income dfemale age attend, baseoutcome(1)
  8. You can get somewhat similar results with a series of binary logistic regression models.  Run a logistic regression model looking just at the contrast between mainline and fundamentalist protestants.  Note:  this requires that you create a new dependent variable.
    1. gen fundvsmainline = 1 if religid == 1
    2. replace fundvsmainline = 0 if religid == 3
    3. logit fundvsmainline educ income dfemale age attend
  9. Answer questions below.

 

 

Question 1:  Based on the simplest model (with education only), which type of protestants is most educated?  Which is least?  Which differences are statistically significant (compared to mainline protestants)?  Can you draw statistical inferences about the difference between fundamentalist and evangelical Protestants from this analysis?

 

Question 2:  Interpret the coefficient for gender (female dummy) on the choice between mainline and fundamentalist Protestantism.  Discuss the raw coefficient (which indicates direction), the relative risk ratio (which is analogous to an odds ratio), and the % difference in relative risk.  Do the results differ when you shift the reference outcome from mainline to fundamentalist?

 

Question 3:  Comment briefly on the consequences of changing the reference outcome from mainline to fundamentalist Protestants:  Generalizing from your experience from Question 2, what happens to coefficients for mainline vs. fundamentalist Protestants when you swap the reference groups?  Also:  Which contrasts can you directly examine once fundamentalist Protestants are the reference group (that could not be easily made when mainline protestants were the reference group)?  Which contrasts can no longer be directly assessed?

 

Question 4:  Comment briefly (2-3 sentences) on the binary logistic regression results.  Were they similar to the multinomial results overall?

 

Turn in the following:

  1. Your “do” file, containing all commands you used for this assignment.
  2. The output from step #5, above.
  3. Answers to the questions.